14 research outputs found
Manticore: Hardware-Accelerated RTL Simulation with Static Bulk-Synchronous Parallelism
The demise of Moore's Law and Dennard Scaling has revived interest in
specialized computer architectures and accelerators. Verification and testing
of this hardware heavily uses cycle-accurate simulation of
register-transfer-level (RTL) designs. The best software RTL simulators can
simulate designs at 1--1000~kHz, i.e., more than three orders of magnitude
slower than hardware. Faster simulation can increase productivity by speeding
design iterations and permitting more exhaustive exploration.
One possibility is to use parallelism as RTL exposes considerable fine-grain
concurrency. However, state-of-the-art RTL simulators generally perform best
when single-threaded since modern processors cannot effectively exploit
fine-grain parallelism.
This work presents Manticore: a parallel computer designed to accelerate RTL
simulation. Manticore uses a static bulk-synchronous parallel (BSP) execution
model to eliminate runtime synchronization barriers among many simple
processors. Manticore relies entirely on its compiler to schedule resources and
communication. Because RTL code is practically free of long divergent execution
paths, static scheduling is feasible. Communication and synchronization no
longer incur runtime overhead, enabling efficient fine-grain parallelism.
Moreover, static scheduling dramatically simplifies the physical
implementation, significantly increasing the potential parallelism on a chip.
Our 225-core FPGA prototype running at 475 MHz outperforms a state-of-the-art
RTL simulator on an Intel Xeon processor running at 3.3 GHz by up to
27.9 (geomean 5.3) in nine Verilog benchmarks
IMPACT: Interval-based Multi-pass Proteomic Alignment with Constant Traceback
Darwin is a genomics co-processor that achieved a 15000x acceleration on long read assembly through innovative hardware and algorithm co-design. Darwins algorithms and hardware implementation were specifically designed for DNA analysis pipelines. This paper analyzes the feasibility of applying Darwins algorithms to the problem of protein sequence alignment. In addition to a behavioral analysis of Darwin when aligning proteins, we propose an algorithmic improvement to Darwins alignment algorithm, GACT, in the form of a multi-pass variant that increases its accuracy on protein sequence alignment. Concretely, our proposed multi-pass variant of GACT achieves on average 14\% better alignment scores
Fundamentals of system-on-chip design on arm cortex-M microcontrollers
This textbook aims to provide learners with an understanding of embedded systems built around Arm Cortex-M processor cores, a popular CPU architecture often used in modern low-power SoCs that target IoT applications. Readers will be introduced to the basic principles of an embedded system from a high-level hardware and software perspective and will then be taken through the fundamentals of microcontroller architectures and SoC-based designs. Along the way, key topics such as chip design, the features and benefits of Arm’s Cortex-M processor architectures (including TrustZone, CMSIS and AMBA), interconnects, peripherals and memory management are discussed. The material covered in this book can be considered as key background for any student intending to major in computer engineering and is suitable for use in an undergraduate course on digital design
Melting of a phase change material in a horizontal annulus with discrete heat sources
Phase change materials have found many industrial applications such as
cooling of electronic devices and thermal energy storage. This paper
investigates numerically the melting process of a phase change material in a
two-dimensional horizontal annulus with different arrangements of two
discrete heat sources. The sources are positioned on the inner cylinder of
the annulus and assumed as constant-temperature boundary conditions. The
remaining portion of the inner cylinder wall as well as the outer cylinder
wall is considered to be insulated. The emphasis is mainly on the effects of
the arrangement of the heat source pair on the fluid flow and heat transfer
features. The governing equations are solved on a non-uniform O type mesh
using a pressure-based finite volume method with an enthalpy porosity
technique to trace the solid and liquid interface. The results are obtained
at Ra=104 and presented in terms of streamlines, isotherms, melting phase
front, liquid fraction and dimensionless heat flux. It is observed that,
depending on the arrangement of heat sources, the liquid fraction increases
both linearly and non-linearly with time but will slow down at the end of the
melting process. It can also be concluded that proper arrangement of discrete
heat sources has the great potential in improving the energy storage system.
For instance, the arrangement C3 where the heat sources are located on the
bottom part of the inner cylinder wall can expedite the melting process as
compared to the other arrangements
A Dynamically Reconfigurable Platform for High-Performance and Low-Power On-Board Processing
FPGAs (Field Programmable Gate Array) are an attractive technology for high-speed data processing in space missions due to their unbeatable flexibility and best performance-to-power ratio in comparison to software. However FPGAs suffer from 3 major drawbacks: (1) higher programming effort is required with respect to software; (2) hardware resources need to be allocated for each implemented function in contrast to software functions which can be executed on the same processing hardware; and (3) FPGAs are required to adopt radiation hardening techniques when deployed in a space environment. This paper presents a reconfigurable platform that demonstrates how modern FPGAs can be considered as computing resources like any other, suitable for emerging spatial applications and not subjected to the above-mentioned drawbacks. In particular, we show that large FPGAs can be split in different regions containing concurrently-running accelerators which can support the execution of a single or multiple applications. Then, in the same way as software-based multiprogrammed and multithreaded systems can dynamically create, schedule and execute threads, FPGA-based accelerators can be swapped in and out according to scheduling needs by exploiting their dynamic partial reconfiguration capability. A proof of concept cloud detection algorithm for Sentinel-2 multispectral images has been implemented and tested on our platform to validate the system's design principles and performance
Numerical study of melting in an annulur enclosure filled with nano-enhanced phase change material
Heat transfer enhancement during melting in a two-dimensional cylindrical
annulus through dispersion of nanoparticle is investigated numerically.
Paraffin-based nanofluid containing various volume fractions of Cu is
applied. The governing equations are solved on a non-uniform O type mesh
using a pressure-based finite volume method with an enthalpy porosity
technique to trace the solid and liquid interface. The effects of
nanoparticle dispersion into pure fluid as well as the influences of some
significant parameters, namely, nanoparticle volume fraction and natural
convection on the fluid flow and heat transfer features are studied. The
results are presented in terms of streamlines, isotherms, temperatures and
velocity profiles and dimensionless heat flux. It is found that the suspended
nanoparticles give rise to the higher thermal conductivity as compared to the
pure fluid and consequently the heat transfer is enhanced. In addition, the
heat transfer rate and the melting time increases and decreases,
respectively, as the volume fraction of nanoparticle increases
Melting of a phase change material in a horizontal annulus with discrete heat sources
Phase change materials have found many industrial applications such as
cooling of electronic devices and thermal energy storage. This paper
investigates numerically the melting process of a phase change material in a
two-dimensional horizontal annulus with different arrangements of two
discrete heat sources. The sources are positioned on the inner cylinder of
the annulus and assumed as constant-temperature boundary conditions. The
remaining portion of the inner cylinder wall as well as the outer cylinder
wall is considered to be insulated. The emphasis is mainly on the effects of
the arrangement of the heat source pair on the fluid flow and heat transfer
features. The governing equations are solved on a non-uniform O type mesh
using a pressure-based finite volume method with an enthalpy porosity
technique to trace the solid and liquid interface. The results are obtained
at Ra=104 and presented in terms of streamlines, isotherms, melting phase
front, liquid fraction and dimensionless heat flux. It is observed that,
depending on the arrangement of heat sources, the liquid fraction increases
both linearly and non-linearly with time but will slow down at the end of the
melting process. It can also be concluded that proper arrangement of discrete
heat sources has the great potential in improving the energy storage system.
For instance, the arrangement C3 where the heat sources are located on the
bottom part of the inner cylinder wall can expedite the melting process as
compared to the other arrangements
CytoKavosh: a cytoscape plug-in for finding network motifs in large biological networks.
Network motifs are small connected sub-graphs that have recently gathered much attention to discover structural behaviors of large and complex networks. Finding motifs with any size is one of the most important problems in complex and large networks. It needs fast and reliable algorithms and tools for achieving this purpose. CytoKavosh is one of the best choices for finding motifs with any given size in any complex network. It relies on a fast algorithm, Kavosh, which makes it faster than other existing tools. Kavosh algorithm applies some well known algorithmic features and includes tricky aspects, which make it an efficient algorithm in this field. CytoKavosh is a Cytoscape plug-in which supports us in finding motifs of given size in a network that is formerly loaded into the Cytoscape work-space (directed or undirected). High performance of CytoKavosh is achieved by dynamically linking highly optimized functions of Kavosh's C++ to the Cytoscape Java program, which makes this plug-in suitable for analyzing large biological networks. Some significant attributes of CytoKavosh is efficiency in time usage and memory and having no limitation related to the implementation in motif size. CytoKavosh is implemented in a visual environment Cytoscape that is convenient for the users to interact and create visual options to analyze the structural behavior of a network. This plug-in can work on any given network and is very simple to use and generates graphical results of discovered motifs with any required details. There is no specific Cytoscape plug-in, specific for finding the network motifs, based on original concept. So, we have introduced for the first time, CytoKavosh as the first plug-in, and we hope that this plug-in can be improved to cover other options to make it the best motif-analyzing tool
Control Panel of CytoKavosh, including the CytoKavosh control tab for getting input parameters.
<p>The right side of the figure shows the ‘results’ table panel after running the CytoKavosh for given input parameters. A table for each run of plug-in appears in the separate tab in ‘result’ panel. These tabs keep the results until finishing the plug-in. For larger sizes of the motifs, the number of detected motifs increases exponentially. So, the ‘results’ table can be explored page by page. The below panel shows the graphical representation of selected motif in the table.</p